The folks at Azuro are committed to clock-tree synthesis. They say their company – founded in the U.K., but now headquartered in Silicon Valley – is pursuing a “fresh and different” approach.
Per Mark Swinnen, director of product marketing: “We’re the only company focused on clock-tree synthesis. No one else has been dealing with the problem, even though the clock is the most important aspect of timing for power in design; 30-to-50 percent is taken up by the clock net alone. It’s baffling that the industry has spent so little time on it, so this is Azuro’s opportunity. While all place-and-route tools have a freebie module for CTS, the industry overall has significantly under-invested in clock design for many years.
“Perhaps that’s not surprising, because the clock problem has crept up on us. Wires need buffers to distribute the signals around, but buffers mean that one transistor sees the signal sooner than others. As chips got bigger, balancing to compensate got more and more complicated. Everybody continued to think it was just about the wires, but the clocks got more complicated, as well. Designs [emerged] with over 100 clocks, highly interleaved with multiplexors to balance things as the clock signals diverged, converged, and re-diverged. As a result, today’s high-end chips have this horribly complicated spaghetti of clock design, with most clocks bumbling along as if you still just need wires to provide the buffers.
“So traditional clock gating continues, down at the RTL where clock gates are put in based on simple pattern recognition. If a block is not being used – for instance, the video circuitry in a cell phone – you place a switch on that clock net, so nothing moves. It’s an effective technique in particular cases, used ubiquitously in the industry, but it’s simplistic – replacing feedback multiplexors with clock gates, with no idea about placement, timing, or power. But, it only works if you don’t push it too far up the clock tree. At Azuro, we’re saying, Hang on there!
“Our PowerCentric tool inserts the clock tree farther up because it works at the gate level, a better level than RTL. Ours is a more sophisticated gating strategy, advanced clock gating, and is not just about pattern swapping in RTL. PowerCentric has full power analysis and a timing engine built in, allowing you to see the impact of timing, power, and area on the full chip because tool works after placement. Where traditional clock gating is established at RTL, a very restricted view, we provide full detail about the chip, including placement and global routing. Some would argue that our strategy costs power by putting in additional clock gates, but it actually saves power. We see 25-to-40 percent savings over traditional clock gating.
“Meanwhile, we’re aware that designers all use place-and-route tools that include a CTS module, yet they’re still stuck with 10-to-12 weeks of balancing the clock tree by hand, an intense amount of manual labor. That’s why they are willing to pay for our tool, despite the free CTS module they already have. It’s easy to use and integrates with their place-and-route tools – taking Verilog, LEF and DEF in, and spitting Verilog, LEF and DEF out.
“The integration with those tools is simple and effective, so much so that we’re in the latest TSMC 9.0 reference flow. Think about it. Even though the TSMC flow is based on implementation, analysis and verification tools from all of the large EDA vendors, still we’re in the flow for CTS. That illustrates how important the design of the clock has become – a completely fundamental shift in the industry. It’s no longer enough to just work with ideal clocks and simple wire models, as in earlier generations of design tools. There are too many on-chip variations, and the logic path is too small in comparison to the clock path.
“At Azuro, we’ve solved the problem. People can still use logic synthesis tools that include with ideal timing models, but our customers can use PowerCentric for CTS to get higher performance, lower power, and lower area in their final design. And now, they can expand on that technology with our newly announced Rubix product, which includes a superset of PowerCentric’s features.
“Rubix is a physical optimization tool that optimizes the clock and the logic at the same time. It’s like a puzzle that’s taming these two things at the same time. The beauty of the tool is that it doesn’t require any changes at the front-end or at signoff. Of course, if you’re using Rubix, you don’t need PowerCentric, but we’ll continue to support both products. Gary Smith [Gary Smith EDA] has talked with us and says what we’re doing makes sense, that it’s obviously the right way to go. [Not surprisingly], we’ve already had our first tapeout at 40 nanometers.
In the next few years, we expect all of the tool vendors in the industry will move in this direction, to clock-concurrent optimization. For now, Azuro will continue to make a good business with our CTS tools. It’s a new idea that’s definitely catching on.”
[Editor’s Note: You can learn more about the Azuro technology by reading their White Paper.]
___________________________________________________________
This article was first published in EDA Weekly on May 25, 2009.
___________________________________________________________
Peggy Aycinena owns and operates EDA Confidential:
peggy@aycinena.com